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A SYSTEM FOR ERROR-RESILIENCE IN COMMUNICATION OF AUDIO- 
VISUAL OBJECTS 

CROSS-REFERENCE TO RELATED PATENT APPLICATIONS 

This patent application claims priority from and is related to U.S. Provisional 
Patent Application Serial Number 60/288,081, filed May 3,2001, this U.S. Provisional 
Patent Application incorporated by reference in its entirety hereia. 

FIELD OF THE INVENTION 

The present invention relates to computer network-based multimedia applications 
in general, and more particularly to error resilience of Systems streams in MPEG-4. 

BACKGROUND OF THE INVENTION 

ISO/EEC 14496, commonly referred to as "MPEG-4", is an intemational standard 
for the communication of interactive audio-visual scenes. This specification includes the 
following elements: 

1 . The coded representation of natural or synthetic, two-dimensional (2D) or three- 
dimensional (3D) objects that can be manifested audibly and/or visually (audio-visual 
objects) (specified in part 1,2 and 3 of ISO/IEC 14496); 

2. The coded representation of the spatio-temporal positioning of audio-visual objects as 
well as their behavior in response to interaction (scene description, specified in part 1 
of ISO/IEC 14496); 
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3 . The coded representation of information related to the management of data streams 
(synchronization, identification, description and association of stream content, 
specified in part 1 of ISO/IEC 14496); and 

4. A generic interface to the data stream deUvery layer functionality (DMIF - specified 
5 in part 6 of ISO/IEC 14496). 

The overall operation of a system communicating audio-visual scenes can be 
paraphrased as follows: 

At the sending terminal, the audio-visual scene information is compressed, 
supplemented with synchronization information and passed on to a delivery layer, that 
1 0 multiplexes it into one or more coded binary streams that are transmitted or stored. 

At the receiving terminal, these streams are de-multiplexed and decompressed. The 
audio-visual objects are composed according to the scene description and synchronization 
information and presented to the end user. 

The end user may have the option to interact with this presentation. Interaction 
1 5 information can be processed locally or transmitted back to the sending terminal. 

ISO/IEC 14496 defines the syntax and semantics of the bit-streams that convey 
such scene information, as well as the details of their decoding processes. 

Scene description addresses the organization of audio-visual objects in a scene, in 
terms of both spatial and temporal attributes. This information allows the composition 
20 and rendering of individual audio-visual objects after the respective decoders have 

reconstructed the streaming data for them. The scene description is represented using a 
parametric approach (BIFS - Binary Format for Scenes). The description consists of an 
encoded hierarchy (graph) of nodes with attributes and other information (including event 
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sources and targets). Leaf nodes in this graph correspond to elementary audio-visual data, 
whereas intermediate nodes (scene description nodes) group this material to form audio- 
visual objects and perform grouping, transformation, and other such operations on audio- 
visual objects. The scene description can evolve over time by using scene description 
5 updates. 

In order to facilitate active user involvement with the presented audio-visual 
information, ISO/IEC 14496-1 provides support for user and object interactions. 
Interactivity mechanisms are integrated with the scene description information, in the 
form of linked event sources and targets (routes), as well as sensors (special nodes that 

10 can trigger events based on specific conditions). These event sources and targets are part 
of scene description nodes, thus allowing close coupling of dynamic and interactive 
behavior with the specific scene at hand. 

Some objects can be stand-alone objects, like "fumiture" or "globe" (Figs. 2 and 
3), in which the object description contains all the necessary information to present the 

1 5 object. Other objects require a stream of additional data, dynamically consumed during 
the presentation. Such is, for instance, the "voice" object (Figs. 2 and 3), which gets its 
content firom an audio stream. So, in addition to the scene description described above, 
the MPEG-4 standard provides a mechanism for defining stream objects, and for linking 
them to the scene objects. This is known as the Object Description Protocol (ODP or 

20 OD). 

The Synchronization Layer (SL) is the part of the standard that deals with the 
delivery of streams between MPEG-4 devices. A stream is made of "access units" 
(AUs), which are the smallest data units to which time attributes can be applied, such as 
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frames in video streams. Access units are packetized in "SL-packets". The packets 
consist of SL-packet headers and the packet payload, i.e. the data. SL-packet headers 
contain the information necessary for the synchronization of data, i.e. time stamps. The 
headers may also contain information used by error-resilience tools. 
5 The scene description itself can be dynamic, i.e. streamed. A BIFS stream is made 

up of an initial Scene Replace command followed by a series of BIFS-update commands. 
A BIFS-update command can insert objects into the scene, remove them, or change 
object properties. This mechanism can be used to create graphic animations. A special 
type of BIFS stream, called Animation Stream, can be used for creating high-quality, 

10 low-bandwidth animations. 

The "scene carousel", also called "BIFS carousel", is a mechanism that allows the 
use of dynamic scenes in broadcast environments. In the broadcast scenarios, it is 
necessary to supply full scene description periodically, so that terminals that tune-in in 
the middle of a session will be able to construct the presentation. On the other hand, it is 

1 5 desirable that terminals that are already tuned-in will receive only scene updates. This is 
necessary, because sometimes the user at the receiving terminal side interacts with the 
scene and changes it locally, applying changes that might be lost if a full scene refresh is 
processed. 

Another use of the scene carousel is in situations when data is transmitted over 
20 unreliable networks. In this case data, including scene updates, can be lost and tiierefore 
a periodical full scene refresh is necessary to recover from such losses. 

The scene carousel is constructed using a tool provided by the Synchronization 
Layer. SL-packet headers may contain a field called "AU_sequenceNumber". This field 
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is regarded as the semantic sequence number of the access unit. When the terminal 
encounters two consecutive access units with the same sequence number, it understands 
that the second carries the same information as the first one and therefore can be ignored. 
In a scene carousel, a sequence of scene updates is followed by a Scene Replace 
5 command that conveys the full description of the scene. The scene, as described by the 
Scene Replace command, is identical to the scene as described by the preceding 
accumulated updates, therefore the command is delivered as an access unit with the same 
sequence number as the preceding access unit. Terminals that have successfully 
processed the update commands will ignore the Scene Replace command, while terminals 
1 0 that need a full scene refresh, whether because they have just tuned-in, or lost data on the 
network, skip the updates and process the Scene Replace command. 

The mechanism is called "BIFS carousel" because it is in common use for BIFS 
and Animation streams, but since the SL is a general tool in MPEG-4, it can be used for 
any kind of stream. 

1 5 The BIFS carousel works well for the broadcast scenario, but it turns out that it 

does not serve the error-resilience requirements very well. The problem is that every 
time an update command is lost, the player must wait for the next Scene Replace 
command and then reset the scene. In many cases, this penalty is far too heavy, as 
illustrated in an example by Figs. lA to IE, which are schematic simplified pictorial 

20 illustrations of five states of a virtual world. 

In Fig. 1 A, the virtual world contains a person in front of a blackboard and a desk. 
In Figs. IB and IC the person moves. In Fig. ID a globe is added to the world and in Fig. 
IE the globe is moved. 
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Figs. 2 and 3 are schematic illustrations of MPEG-4 scene descriptions that 
describe the virtual worlds of the given example. In practice, each of these objects would 
require a complete branch made up of node hierarchy, but are herein presented as single 
nodes for the sake of simplification. Fig. 2 illustrates the BIFS nodes that construct the 
5 initial state of the world, as shown in Fig. lA and Fig. 3 illustrates the BIFS nodes that 
construct the final state of the world as shown in Fig. IE. 

Further, with respect to the same example, reference is now made to Fig. 4, which 
is a schematic simplified illustration of a set of access units that convey the virtual world 
of Fig. lA and all subsequent changes of state, up to and including state 5 (Fig. IE). The 

1 0 first access unit in the set contains a Scene Replace command that conveys the entire 
scene of state 1 (Fig. lA). The second and third access units contain Update Field 
commands that change the position of the person's body into state 2 (Fig. IB) and then 
state 3 (Fig. IC). The fourth access unit inserts a new node, which is the globe. The fifth 
access unit changes the position of the globe. 

1 5 If, say, access unit 2 is lost, the receiving terminal waits for the next full Scene 

Replace command to reset the scene, when actually access unit 2 could be ignored and 
the next Update field (access unit 3) could have been used for moving the person's body 
to the correct place. This would have been much better than freezing the scene till next 
Scene Replace, and then spending precious time on resetting the entire scene and losing 

20 changes done locally by the viewer. 
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SUMMARY OF THE INVENTION 

According to teachings of the present invention there is provided an error-resilient 
mechanism for processing scene description streams in scenarios that involve 
transmission of interactive MPEG-4 scenes over unreliable networks, using ISO/IEC 
5 14496-1 protocol having SL-packetized streams, wherein each SL packet has a header 
comprising access units having an AU_sequenceNumber field and a 
RandomAccessPointFlag field, comprising: 

defining a numeration indicator field in said SL packet header; 

defining a synchronization point flag field in said SL packet header; and 
10 incrementing the numeration indicator of an access unit in said SL packet header if 

and only if said access unit conveys a fundamental scene change. 

Additionally, according to teachings of the present invention, a scene change is 
considered fimdamental if its loss inhibits correct processing of subsequent data. 

Additionally, according to teachings of the present invention, setting the 
1 5 synchronization point flag of a first access unit to 1 and setting the numeration indicator 
of said first access unit equal to the numeration indicator of a second access unit exactly 
preceding said first access unit indicates no change of scene over said second access unit. 

Additionally, according to teachings of the present invention, setting the 
synchronization point flag of a first access unit to 1 and setting the numeration indicator 
20 of said first access unit equal to the numeration indicator of a second access unit exactly 
preceding said first access unit provides a synchronization point for lost data. 
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Additionally, according to teachings of the present invention, there is provided a 
method of processing scene description streams in scenarios that involve transmission of 
interactive MPEG-4 scenes, in a receiving terminal, said scene description streams using 
ISO/IEC 14496-1 protocol having SL-packetized streams, wherein each SL packet has a 
header comprising access units having an AU_sequenceNumber field and a 
RandomAccessPointFlag field, comprising the steps of: 

defining a numeration indicator field in said SL packet header; 

defining a synchronization point flag field in said SL packet header; 

receiving a first access unit; 

checking if a second access unit, exactly preceding said first access unit has been 
received; 

checking if the numeration indicator of said first access unit is different firom the 
numeration indicator of a last access unit received before said first access unit; 

processing said first access unit if said second access unit has been received and the 
numeration indicator of said first access unit is different from the numeration indicator of 
said last access unit; 

checking if the synchronization point flag of said first access unit is set, if said second 
access unit has been received and the numeration indicator of said first access unit is not 
different fi-om the numeration indicator of said last access unit, or if said second access unit 
has not been received; 

if said synchronization point flag is set, processing said first access unit if said second 
access unit has not been received and the numeration indicator of said first access unit is 
different firom the numeration indicator of said last access unit; and 
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if said synchronization point flag is not set, processing said first access unit if said 
second access unit has been received and the numeration indicator of said first access unit is 
equal to the numeration indicator of said last access unit, or if said second access unit has not 
been received and the numeration indicator of said first access unit is equal to the numeration 
indicator of said last access unit. 

Additionally, according to teachings of the present invention, the numeration 
indicator field is defined as said AU_sequenceNumber field and said synchronization 
point flag field is defined as said RandomAccessPointFlag field. 
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BRIEF DESCMPTION OF THE DRAWINGS 

The present invention will be understood and appreciated more fully from the 
following detailed description taken in conjunction with the appended drawings in which: 

Fig, lA is a schematic pictorial illustration of state 1 of a virtual world, useful in 
5 understanding the present invention; 

Fig. IB is a schematic pictorial illustration of state 2 of the virtual world, useful in 
understanding the present invention; 

Fig. IC is a schematic pictorial illustration of state 3 of the virtual world, useful in 
understanding the present invention; 
10 Fig. ID is a schematic pictorial illustration of state 4 of the virtual world, useful in 

understanding the present invention; 

Fig. IE is a schematic pictorial illustration of state 5 of the virtual world, useful in 
understanding the present invention; 

Fig. 2 is a schematic simplified illustration of an MPEG-4 scene description that 
1 5 describes the virtual world of Fig. 1 A; 

Fig. 3 is a schematic simplified illustration of an MPEG-4 scene description that 
describes the virtual world of Fig. IE; 

Fig. 4 shows the set of access units that convey the virtual world of Fig. 1 A and all 
subsequent changes of state up to and including state 5; 
20 Fig. 5 shows a set of scene description frames, with their Synchronization Layer 

(SL) header information, in a sample broadcast over unreliable network of the virtual 
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world of Fig. lA and all subsequent changes of state up to and including state 5, 
according to the present invention; and 

Fig. 6 is a flowchart describing the behavior of a receiving terminal when 
processing SL header information attached to access units of scene description, according 
to the present invention. 
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DETAILED DESCRIPTION OF THE INVENTION 

According to the present invention, two fields are defined in the SL-packet header: 

1 . A numeration indicator; and 

2. A synchronization point flag. 

5 Assuming there are access units x and y, so that access unit y arrives sometime 

after access unit x, not necessarily consecutively. Assuming also that if access unit x has 
been successfully received and processed, then access unit y can also be processed, even 
if all units in between have been lost. If this is also the case for every access unit 
between x and y, then all the access units from x to y, inclusive, do not change the 

1 0 fundamental state of the scene and therefore, according to the present invention, will be 
given the same numeration indicator. Assuming also an access unit with its 
synchronization point flag set to 1 exists in the scene description stream and this access 
unit conveys the same status of the scene description as was conveyed by preceding 
access units. The access unit repeats this information for the benefit of receivers that 

1 5 missed the preceding access unit, either because of data loss on the network or because a 
new user joins a broadcast session. In this case, according to the present invention, the 
said access unit will be given the same numeration indicator as the preceding one. 

The numeration indicator and synchronization point flag fields of the present 
invention may be new fields, added to the specification of the SL-packet header* 

20 Alternatively, use may be made of existing fields, such that the AU_sequenceNumber 
field of the SL-packet header serves as the numeration indicator and the 
RandomAccessPointFlag (RAP) field serves as the synchronization point flag. The 
following description uses the second embodiment by way of an example. 
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Reference is now made to Fig. 6, which is a flowchart describing the behavior of a 
receiving terminal when processing SL header information attached to access units of 
scene description, according to the present invention. In step 100 the terminal receives a 
new access unit containing a scene description or object description frame. In step 110 
5 the terminal checks whether it had successfully processed the access unit preceding the 
one that had been received in step 100. If it had, i.e. when no loss of data existed prior to 
receiving the new access unit, the terminal checks, in step 120, whether the value in the 
AU_sequenceNumber field of the SL-packet header, labeled AU_#, has changed fi-om the 
preceding access unit. If the AU_# has changed, the new access unit is processed with no 

1 0 further filtering in step 1 30. Otherwise, if the AU_# has not changed, the 

RandomAccessPointFlag (RAP) field of the access unit is checked in step 140. Only if 
the RAP is not set is the access unit processed, in step 150. 

When step 110 shows that loss of data did exist prior to receiving the new access 
unit, either because of data loss over unreliable network or because a new user joins an 

1 5 existing broadcast session, the terminal checks, in step 160, whether the value in the 

AU_sequenceNumber field of the SL-packet header, labeled AU_#, has changed from the 
preceding access unit. If it has changed, the RandomAccessPointFlag (RAP) field of the 
access unit is checked in step 170 and the new access unit will be processed, in step 150, 
only if the RAP is set, i.e. it is marked as a synchronization point. If it is not marked as a 

20 synchronization point, the terminal must skip this access unit and all subsequent access 
units which are not synchronization points, until an access unit which is a 
synchronization point is received (step 180). If step 160 shows that the AU_# has not 
changed from the previous access unit, the RandomAccessPointFlag (RAP) field of the 
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access unit is checked in step 140. Only if the RAP is not set is the access unit processed, 
in step 150. 

Further, with respect to the example of Figs. lA to IE, reference is now made to 
Fig. 5, which shows a set of scene description frames with their Synchronization Layer 
5 (SL) header information, as can be used to convey the access units of Fig. 4 in broadcast 
scenario over unreliable network, using the mechanism of the present invention. All 
frames are accompanied by SL header information that includes, but is not limited to, two 
fields - AU_sequenceNumber (labeled AU_# hereinafter) and RandomAccessPointFlag 
(RAP hereinafter). The value of these fields in each of the frames is given: 
1 0 Frame #1 contains the Scene Replace command that conveys the entire scene of 

state 1. It has AU_# 1 and RAP set, because this is a synchronization point. 

Frame #2 contains the command that changes the person's body position for the 
first time. Its RAP field is not set, and because the change of position is not considered 
fiindamental, its AU__# is still 1. 
1 5 Frame #3 is again a Scene Replace command that conveys the entire scene up to 

date at state 2. Its RAP field is set, because this is a synchronization point, and because it 
conveys no new information its AU_# is still 1. 

Frame #4 contains the command that changes the person's body position for the 
second time. Its RAP field is not set, and, because the change of position is not 
20 considered fiindamental, its AU_# is still 1. 

Frame #5 contains the command that inserts the globe into the scene. Its RAP field 
is not set, and because the insertion of a new object is considered fiindamental, its AU_# 
is set to 2. 
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Frame #6 contains the command that changes the globe's position. Its RAP field is 
not set, and, because the change of position is not considered ftindamental, its AU_# is 
still 2. 

Frame #7 is again a Scene Replace command that conveys the entire scene up to 
5 date at state 5. Its RAP field is set, because this is a synchronization point, and because it 
conveys no new information its AU_# is still 2. 

Further, with respect to the same example. Table 1 describes the behavior of a 
receiving terminal (player), in a sample scenario in which the frames of Fig. 5 are 
transmitted, but packets 2 and 5 are lost on the network. The table describes the operation 
10 of the terminal in this sample scenario, according to the present invention. 



Packet 
# 


AU_sequenceNu 
mber 


RAP 


Receiving Terminal Behavior 


1 


1 


Yes 


Player tunes in, this is a RAP so player starts processing AUs 


2 


1 


No 


Packet lost 


3 


1 


Yes 


This is a synchronization point, ignored by player because 
loss of data was not fundamental 


4 


1 


No 


Process update (even though it's same number as preceding 
AU) 


5 


2 


No 


Packet lost 


6 


2 


No 


Cannot process update since it depends on a lost packet. Wait 
for RAP 
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7 


2 


Yes 


This is a synchronization point, processed by player because 








of previous loss of ftmdamental data 



Table 1 - A sample behavior of a player 
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WHAT IS CLAIMED IS: 

1 . An error-resilient mechanism for processing scene description streams in 
scenarios that involve transmission of interactive MPEG-4 scenes over unreliable 
networks, using ISO/IEC 14496-1 protocol having SL-packetized streams, wherein each 

5 SL packet has a header comprising access units having an AU__sequenceNumber field 
and a RandomAccessPointFlag field, comprising: 

defining a numeration indicator field in said SL packet header; 
defining a synchronization point flag field in said SL packet header; and 
incrementing the numeration indicator of an access unit in said SL packet header if 
1 0 and only if said access unit conveys a fundamental scene change. 

2. The mechanism of claim 1, wherein a scene change is considered fundamental 
if its loss inhibits correct processing of subsequent data. 

3. The mechanism of claim 1, wherein setting the synchronization point flag of a first 
access unit to 1 and setting the numeration indicator of said first access unit equal to the 

1 5 numeration indicator of a second access unit exactly preceding said first access unit 
indicates no change of scene over said second access unit. 

4. The mechanism of claim 3, wherein said setting the synchronization point fiag of 
said first access unit to 1 and said setting the numeration indicator of said first access unit 
equal to the numeration indicator of said second access unit exactly preceding said first 

20 access unit provides a synchronization point for lost data. 

5. The mechanism of either of claims 1 to 4, wherein said numeration indicator field is 
defined as said AU_sequenceNumber field and said synchronization point flag field is 
defined as said RandomAccessPointFlag field. 
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6. A method of processing scene description streams in scenarios that involve 
transmission of interactive MPEG-4 scenes, in a receiving terminal, said scene description 
streams using ISO/IEC 14496-1 protocol having SL-packetized streams, wherein each SL 
packet has a header comprising access units having an AU_sequenceNumber field and a 
5 RandomAccessPointFlag field, comprising the steps of: 

defining a numeration indicator field in said SL packet header; 

defining a synchronization point flag field in said SL packet header; 

receiving a first access unit; 

checking if a second access unit, exactly preceding said first access unit has been 
10 received; 

checking if the numeration indicator of said first access unit is different from the 
numeration indicator of a last access unit received before said first access unit; 

processing said first access unit if said second access unit has been received and the 
numeration indicator of said first access unit is different from the numeration indicator of 
1 5 said last access unit; 

checking if the synchronization point flag of said first access unit is set, if said second 
access unit has been received and said numeration indicator of said first access unit is not 
different fi-om the numeration indicator of said last access unit, or if said second access unit 
has not been received; 

20 if said synchronization point flag is set, processing said first access unit if said second 

access unit has not been received and the numeration indicator of said first access unit is 
different fi:om the numeration indicator of said last access unit; and 
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if said synchronization point flag is not set, processing said first access unit if said 
second access unit has been received and the numeration indicator of said first access unit is 
equal to the numeration indicator of said last access unit, or if said second access unit has not 
been received and the numeration indicator of said first access unit is equal to the numeration 
5 indicator of said last access unit. 

7. The method of claim 6, wherein said numeration indicator field is defined as said 
AU_sequenceNumber field and said synchronization point flag field is defined as said 
RandomAccessPointFIag field. 
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FIG.2 



wo 02/091748 



PCT/IL02/00324 




FIG.3 
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